This script creates a function for scraping information on music artists from the Last.fm API and adds artist info to our initial dataset. My initial recommendation model recommends artists only based on collaborative filtering, recommending music to a user based on the listening history of other users with similar listening preferences. However, I would like to also be able to recommend artists based on item similarity. In order to do this, I will need to extract the “tags” associated with each artist. Tags are artist classifiers that can more or less be thought of as a genre.
The function’s only arguments are “artist” and “api.key”. The output is a list of the tags associated with each artist. The function relies on the “httr” and “jsonlite” packages.
Let’s first load the necessary packages, as well as the Last.fm raw data set
library(httr)
library(jsonlite)
load("~/Documents/NYU/APSTA 2017/EDSP_v2/large data/Lastfm_data.RData")
Let’s specify the function “get.tags”
get.tags <- function(artist, api.key) {
raw <- GET(url = paste("http://ws.audioscrobbler.com/2.0/", "?method=artist.getinfo&artist=", artist, "&api_key=", api.key,"&format=json", sep = ""))
char <- fromJSON(rawToChar(raw$content))
char$artist$tags$tag[1]
}
Set API key.
my.api.key <- "b1824d4815fb7388c8c54df8732f43e8"
Let’s see an example of the “get.tags” function, scraping the tags for Marvin Gaye.
get.tags("marvin gaye", my.api.key)
We can utilize this function to merge the tag of each artist into the data. You might recall from the previous “Interactive recommender.Rmd” file the subsetting of artists to only include artists who have been played by 10 or more users. Let’s create this subset again, but set the minimum required unique users to be 100.
Artist.plays.100 <- Artist.plays[which(Artist.plays$N.users >= 100),]
head(Artist.plays.100)
Now, let’s add a 4th column; one that contains a list of the tags associated with each artist using the “get.tags” function.
Artist.plays.100$tags <- NA
for (i in 1:nrow(Artist.plays.100)) {
Artist.plays.100$tags[i] <- list(get.tags(Artist.plays.100$Artist[i], my.api.key))
}
Error: lexical error: invalid char in json text.
<?xml version="1.0" encoding="u
(right here) ------^
LS0tCnRpdGxlOiAiVGFnIFNjcmFwZSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKVGhpcyBzY3JpcHQgY3JlYXRlcyBhIGZ1bmN0aW9uIGZvciBzY3JhcGluZyBpbmZvcm1hdGlvbiBvbiBtdXNpYyBhcnRpc3RzIGZyb20gdGhlIExhc3QuZm0gQVBJIGFuZCBhZGRzIGFydGlzdCBpbmZvIHRvIG91ciBpbml0aWFsIGRhdGFzZXQuICBNeSBpbml0aWFsIHJlY29tbWVuZGF0aW9uIG1vZGVsIHJlY29tbWVuZHMgYXJ0aXN0cyBvbmx5IGJhc2VkIG9uIGNvbGxhYm9yYXRpdmUgZmlsdGVyaW5nLCByZWNvbW1lbmRpbmcgbXVzaWMgdG8gYSB1c2VyIGJhc2VkIG9uIHRoZSBsaXN0ZW5pbmcgaGlzdG9yeSBvZiBvdGhlciB1c2VycyB3aXRoIHNpbWlsYXIgbGlzdGVuaW5nIHByZWZlcmVuY2VzLiAgSG93ZXZlciwgSSB3b3VsZCBsaWtlIHRvIGFsc28gYmUgYWJsZSB0byByZWNvbW1lbmQgYXJ0aXN0cyBiYXNlZCBvbiBpdGVtIHNpbWlsYXJpdHkuICBJbiBvcmRlciB0byBkbyB0aGlzLCBJIHdpbGwgbmVlZCB0byBleHRyYWN0IHRoZSAidGFncyIgYXNzb2NpYXRlZCB3aXRoIGVhY2ggYXJ0aXN0LiAgVGFncyBhcmUgYXJ0aXN0IGNsYXNzaWZpZXJzIHRoYXQgY2FuIG1vcmUgb3IgbGVzcyBiZSB0aG91Z2h0IG9mIGFzIGEgZ2VucmUuCgpUaGUgZnVuY3Rpb24ncyBvbmx5IGFyZ3VtZW50cyBhcmUgImFydGlzdCIgYW5kICJhcGkua2V5Ii4gIFRoZSBvdXRwdXQgaXMgYSBsaXN0IG9mIHRoZSB0YWdzIGFzc29jaWF0ZWQgd2l0aCBlYWNoIGFydGlzdC4gIFRoZSBmdW5jdGlvbiByZWxpZXMgb24gdGhlICJodHRyIiBhbmQgImpzb25saXRlIiBwYWNrYWdlcy4KCkxldCdzIGZpcnN0IGxvYWQgdGhlIG5lY2Vzc2FyeSBwYWNrYWdlcywgYXMgd2VsbCBhcyB0aGUgTGFzdC5mbSByYXcgZGF0YSBzZXQKCmBgYHtyfQpsaWJyYXJ5KGh0dHIpCmxpYnJhcnkoanNvbmxpdGUpCmBgYAoKYGBge3J9CmxvYWQoIn4vRG9jdW1lbnRzL05ZVS9BUFNUQSAyMDE3L0VEU1BfdjIvbGFyZ2UgZGF0YS9MYXN0Zm1fZGF0YS5SRGF0YSIpCmBgYAoKTGV0J3Mgc3BlY2lmeSB0aGUgZnVuY3Rpb24gImdldC50YWdzIgpgYGB7cn0KZ2V0LnRhZ3MgPC0gZnVuY3Rpb24oYXJ0aXN0LCBhcGkua2V5KSB7CiAgcmF3IDwtIEdFVCh1cmwgPSBwYXN0ZSgiaHR0cDovL3dzLmF1ZGlvc2Nyb2JibGVyLmNvbS8yLjAvIiwgIj9tZXRob2Q9YXJ0aXN0LmdldGluZm8mYXJ0aXN0PSIsIGFydGlzdCwgIiZhcGlfa2V5PSIsIGFwaS5rZXksIiZmb3JtYXQ9anNvbiIsIHNlcCA9ICIiKSkKICBjaGFyIDwtIGZyb21KU09OKHJhd1RvQ2hhcihyYXckY29udGVudCkpCiAgY2hhciRhcnRpc3QkdGFncyR0YWdbMV0KfQpgYGAKClNldCBBUEkga2V5LgpgYGB7cn0KbXkuYXBpLmtleSA8LSAiYjE4MjRkNDgxNWZiNzM4OGM4YzU0ZGY4NzMyZjQzZTgiCmBgYAoKTGV0J3Mgc2VlIGFuIGV4YW1wbGUgb2YgdGhlICJnZXQudGFncyIgZnVuY3Rpb24sIHNjcmFwaW5nIHRoZSB0YWdzIGZvciBNYXJ2aW4gR2F5ZS4KYGBge3J9CmdldC50YWdzKCJtYXJ2aW4gZ2F5ZSIsIG15LmFwaS5rZXkpCmBgYAoKV2UgY2FuIHV0aWxpemUgdGhpcyBmdW5jdGlvbiB0byBtZXJnZSB0aGUgdGFnIG9mIGVhY2ggYXJ0aXN0IGludG8gdGhlIGRhdGEuICBZb3UgbWlnaHQgcmVjYWxsIGZyb20gdGhlIHByZXZpb3VzICJJbnRlcmFjdGl2ZSByZWNvbW1lbmRlci5SbWQiIGZpbGUgdGhlIHN1YnNldHRpbmcgb2YgYXJ0aXN0cyB0byBvbmx5IGluY2x1ZGUgYXJ0aXN0cyB3aG8gaGF2ZSBiZWVuIHBsYXllZCBieSAxMCBvciBtb3JlIHVzZXJzLiAgTGV0J3MgY3JlYXRlIHRoaXMgc3Vic2V0IGFnYWluLCBidXQgc2V0IHRoZSBtaW5pbXVtIHJlcXVpcmVkIHVuaXF1ZSB1c2VycyB0byBiZSAxMDAuCmBgYHtyfQpBcnRpc3QucGxheXMuMTAwIDwtIEFydGlzdC5wbGF5c1t3aGljaChBcnRpc3QucGxheXMkTi51c2VycyA+PSAxMDApLF0KaGVhZChBcnRpc3QucGxheXMuMTAwKQpgYGAKCk5vdywgbGV0J3MgYWRkIGEgNHRoIGNvbHVtbjsgb25lIHRoYXQgY29udGFpbnMgYSBsaXN0IG9mIHRoZSB0YWdzIGFzc29jaWF0ZWQgd2l0aCBlYWNoIGFydGlzdCB1c2luZyB0aGUgImdldC50YWdzIiBmdW5jdGlvbi4KYGBge3J9CkFydGlzdC5wbGF5cy4xMDAkdGFncyA8LSBOQQpmb3IgKGkgaW4gMTpucm93KEFydGlzdC5wbGF5cy4xMDApKSB7CiAgQXJ0aXN0LnBsYXlzLjEwMCR0YWdzW2ldIDwtIGxpc3QoZ2V0LnRhZ3MoQXJ0aXN0LnBsYXlzLjEwMCRBcnRpc3RbaV0sIG15LmFwaS5rZXkpKQp9CiAgCmBgYAoK